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Abstract 

We give the first nontrivial upper bounds on the average sensitivity and noise sensitivity of 
polynomial threshold functions. More specifically, for a Boolean function / on n variables equal 
to the sign of a real, multivariate polynomial of total degree d we prove 

• The average sensitivity of / is at most 0(n 1_1// ' 4d+6 - ) ) (we also give a combinatorial proof 
of the bound O^i 1 - 1 / 2 "). 

• The noise sensitivity of / with noise rate 5 is at most 0(S 1 ^ 4d+6 ^). 

Previously, only bounds for the degree d = 1 case were known (0(y/n) and 0(y/~5), for 
average and noise sensitivity respectively). 

We highlight some applications of our results in learning theory where our bounds immedi- 
ately yield new agnostic learning algorithms and resolve an open problem of Klivans et al. 

The proof of our results use (i) the invariance principle of Mosscl et al., (ii) the anti- 
concentration properties of polynomials in Gaussian space due to Carbery and Wright and (iii) 
new structural theorems about random restrictions of polynomial threshold functions obtained 
via hypercontractivity. 

These structural results may be of independent interest, as they provide a generic template 
for transforming problems related to polynomial threshold functions defined on the Boolean 
hypercube to polynomial threshold functions defined in Gaussian space. 



1 Introduction 



1.1 Background 

Let P be a real, multivariate polynomial of degree d, and let / = sign(P). We say that the Boolean 
function / is a polynomial threshold function (PTF) of degree d. PTFs play an important role in 
computational complexity with applications in circuit complexity [ABFR94, Bei93], learning the- 
ory [KS04, KOS04], communication complexity [She08, She09], and quantum computing [BBC + 01]. 
While many interesting properties (e.g., Fourier spectra, influence, sensitivity) have been charac- 
terized for the case d = 1 of linear threshold functions (LTFs), very little is known for degrees 2 
and higher. Gotsman and Linial [GL94] conjectured, for example, that the average sensitivity of a 
degree d polynomial is 0(dy/n). In this work, we take a step towards resolving this conjecture and 
give the first nontrivial bounds on the average sensitivity and noise sensitivity of degree d PTFs 
(Theorem 1.6) . 

Average sensitivity [BL85] and noise sensitivity [KKL88, BKS99] are two fundamental quantities 
that arise in the analysis of Boolean functions. Roughly speaking, the average sensitivity of a 
Boolean function / measures the expected number of bit positions that change the sign of / for 
a randomly chosen input, and the noise sensitivity of / measures the probability over a randomly 
chosen input x that / changes sign if each bit of x is flipped independently with probability 5 (we 
give formal definitions below). 

Bounds on the average and noise sensitivity of Boolean functions have direct applications in 
hardness of approximation [HasOl, KKMO07], hardness amplification [O'D04], circuit complexity 
[LMN93], the theory of social choice [Kal05], and quantum complexity [ShiOO]. In this paper, we 
focus on applications in learning theory, where it is known that bounds on the noise sensitivity of a 
class of Boolean functions yield learning algorithms for the class that succeed in harsh noise models 
(i.e., work in the agnostic model of learning) [KKMS08]. We obtain the first efficient algorithms 
for agnostically learning PTFs with respect to the uniform distribution on the hypercube. We 
also give efficient algorithms for agnostically learning ellipsoids in M. n with respect to the Gaussian 
distribution, resolving an open problem of Klivans et al. [KOS08]. We discuss these learning theory 
applications in Section 2. 

1.2 Main Definitions and Results 

We begin by defining the (Boolean) noise sensitivity of a Boolean function: 

Definition 1.1 (Boolean noise sensitivity). Let f be a Boolean function f : {1, — l} n — > {1,-1}. 
For any 5 6 (0, 1), let X be a random element of the hypercube {1, — l} n and Z a 5 -perturbation of 
X defined as follows: for each i independently, Zi is set to Xi with probability 1 — 5 and —X{ with 
probability b. The noise sensitivity of f , denoted NS^(/), for noise 5 is then defined as follows: 

m 5 (f) = Pv[f(x)^f(z)]. 

Intuitively, the Boolean noise sensitivity of / measures the probability that / changes value 
when a random input to / is perturbed slightly. In order to analyze Boolean noise sensitivity, we 
will also need to analyze the Gaussian noise sensitivity, which is defined similarly, but the random 
variables X and Z are drawn from a multivariate Gaussian distribution. Let N = A/"(0, 1) denote 
the univariate Gaussian distribution on K with mean and variance 1. 

Definition 1.2 (Gaussian noise sensitivity). Let f : W 1 — > { — 1, 1} be any Boolean function on M n . 
Let X, Y be two independent random variables drawn from the multivariate Gaussian distribution 
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Af n and Z a 5 -perturbation of X defined by Z = (1 — S)X + y/26 — 5 2 Y . The Gaussian noise 
sensitivity of f , denoted &NE>s(f), for noise 5 is defined as follows: 

Gm s (f) = Px[f(x)^f(z)}. 

It is well known that the Boolean and Gaussian noise sensitivity of LTFs are at most 0{y~§). 
Our results give the first nontrivial bounds for degrees 2 and higher in both the Gaussian and 
Boolean cases, with the Gaussian case being considerably easier to handle than the Boolean case. 

Theorem 1.3 (Boolean noise sensitivity). For any degree d PTF f : {1,— l} n — ► {1,-1} and 
< 6 < 1, 

NS tf (/) = 2°W .(V/(4d+6) N 



For the Gaussian case, we get a slightly better dependence on the degree d: 

Theorem 1.4 (Gaussian noise sensitivity). For any degree d polynomial P such that P is either 
multilinear or corresponds to an ellipsoid, the following holds for the corresponding PTF f = 
sign(P). For all < 6 < I, 

GNS*(/) = 2°W • . 

Diakonikolas et al. [DRST09] prove that a similar bound holds for all degree d PTFs. Our next 
set of results bound the average sensitivity or total influence of degree d PTFs. 

Definition 1.5 (average sensitivity). Let f be a Boolean function, and let X be a random element 
of the hypercube {1, -1}". Let X® G {1, -l} n be such that xf ] = -X { and xf = Xj for j / i. 
Then, the influence of the i th variable is defined by 

Uf) = Pr [/ (X)jkf(xW 

The sum of all the influences is referred to as the average sensitivity of the function f , 

AS(/) = J>(/). 

i 

Clearly, for any function /, AS(/) is at most n. It is well known that the average sensitivity 
of "unate" functions (functions monotone in each coordinate), and thus of LTFs in particular is 
0{yfn). This bound is tight as the Majority function has average sensitivity ®(y/n). As mentioned 
before, Gotsman and Linial [GL94] conjectured in 1994 that the average sensitivity of any degree 
d PTF / is 0{d^Jn). We are not aware of any progress on this conjecture until now, with no o(n) 
bounds known. 

We give two upper bounds on the average sensitivity of degree d PTFs. We first use a simple 
translation lemma for bounding average sensitivity in terms of noise sensitivity of a Boolean function 
and Theorem 1.3 to obtain the following bound. 

Theorem 1.6 (average sensitivity). For a degree d PTF f : {1, —1}™ — ► {1, —1}, 

AS(/) = 2°W • (n 1 - 1 /^) . 



We also give an elementary combinatorial argument, to show that the average sensitivity of 
any degree d PTF is at most 3ra 1-1 / 2 . The combinatorial proof is based on the following lemma 
for general Boolean functions that may prove useful elsewhere. For x £ {1,-1}", and i E [n], let 

X—i — \%\ , ■ ■ • j j • • • ; • 
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Lemma 1.7. For Boolean functions : {1, — l} n — ► {1,-1} wra£/i not depending on the i'th 
coordinate X\, and X £ u {1, — l} n , 



E 
x 



< 



2^AS(/i) + 



n. 



We believe that when the functions fi in the above lemma are LTFs, the above bound can be 
improved to O(n), which in turn would imply the Gotsman-Linal conjecture for quadratic threshold 
functions. 



1.3 Random Restritctions of PTFs — a structural result 

An important ingredient of our sensitivity bounds for PTFs are new structural theorems about 
random restrictions of PTFs obtained via hypercontractivity. The structural results we obtain can 
be seen as part of the high level "randomness vs structure" paradigm that has played a fundamental 
role in many recent breakthroughs in additive number theory and combinatorics. Specifically, we 
obtain the following structural result (Lemmas 5.1 and 5.2): for any PTF, there exists a small set 
of variables such that with at least a constant probability, any random restriction of these variables 
satisfies one of the following: (1) the restricted polynomial is "regular" in the sense that no single 
variable has large influence or (2) the sign of the restricted polynomial is a very biased function. 

We remark that our structural results, though motivated by similar results of Servedio [Ser07] 
and Diakonikolas et al. [DGJ + 09] for the simpler case of LTFs, do not follow from a generalization 
of their arguments for LTFs to PTFs. The structural results for random restrictions of low-degree 
PTFs provide a reasonably generic template for reducing problems involving arbitrary PTFs to 
ones on regular PTFs. In fact, these structural properties are used precisely for the above reason 
both in this work and in a parallel work by one of the authors, Meka and Zuckerman [MZ09] to 
construct pseudorandom generators for PTFs. 



1.4 Related Work 

Independent of this work, Diakonikolas, Raghavendra, Servedio, and Tan [DRST09] have obtained 
nearly identical results to ours for both the average and noise sensitivity of PTFs. The broad outline 
of their proof is also similar to ours. In our proof, we first obtain bounds on noise sensitivity and 
then move to average sensitivity using a translation lemma. On the other hand, Diakonikolas et 
al. [DRST09] first obtain bounds on the average sensitivity of PTFs and then use a generalization 
of Peres' argument [Per04] for LTFs to move from average sensitivity to noise sensitivity. 

Regarding our structural result described in Section 1.3, Diakonikolas, Servedio, Tan and Wan 
[DSTW09] have independently obtained similar results to ours. As an application, they prove the 
existence of low-weight approximators for polynomial threshold functions. 



1.5 Proof Outline 

The proofs of our theorems are inspired by the use of the invariance principle in the proof of the 
"Majority is Stablest" theorem [MOO05]. As in the proof of the "Majority is Stablest" theorem, 
our main technical tools are the invariance principle and the anti-concentration bounds (also called 
small ball probabilities) of Carbery and Wright [CW01]. 

Bounding the probability that a threshold function changes value either when it is perturbed 
slightly (in the case of noise sensitivity) or when a variable is flipped (average sensitivity) involves 
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bounding probabilities of the form Pr[|Q(X)| < |i?(X)|] where Q(X), R(X) are low degree poly- 
nomials and R has small /2-norm relative to that of Q. The event |Q(X)| < implies that 
either |Q(X)| is small or |i?(X)| is large. In other words, for every 7 

Pr [\Q(X)\ < \R(X)\] < Pr [\Q(X)\ < 7] + Pr [\R(X)\ > 7] . 

Since R has small norm, the second quantity in the above expression can be easily bounded using 
a tail bound (even Markov's inequality suffices). Bounding the first quantity is trickier. Our 
first observation is that if the random variable X were distributed according to the Gaussian 
distribution as opposed to the uniform distribution on the hypercube, bounds on probabilities 
of the form Pr[|Q(X)| < 7] immediately follow from the anti-concentration bounds of Carbery 
and Wright [CW01]. We then transfer these bounds to the Boolean setting using the invariance 
principle. 

Unfortunately, the invariance principle holds only for regular polynomials (i.e., polynomials 
in which no single variable has large influence). We thus obtain the required bounds on noise 
sensitivity and average sensitivity for the special case of regular PTFs. We then extend these 
results to an arbitrary PTF / using our structural results on random restrictions of the PTF /. 
The structural results state that either the restricted PTF is a regular polynomial or is a very 
biased function. In the former case, we resort to the above argument for regular PTFs and bound 
the noise sensitivity of the given PTF. In the latter case, we merely note that the noise sensitivity 
of a biased function can be easily bounded. This in turn lets us extend the results for regular PTFs 
to all PTFs. 

2 Learning Theory Applications 

In this section, we briefly elaborate on the learning theory applications of our results. Our bounds 
on Boolean and Gaussian noise sensitivity imply learning results in the challenging agnostic model 
of learning of Haussler [Hau92] and Kearns, Schapire and Sellie [KSS94] which we define below. 

Definition 2.1. Let T> be an arbitrary distribution on X and C a class of Boolean functions f : 
X — > { — 1, 1}. For (S,£ 6 (0, 1), we say that algorithm A is a (5, e)-agnostic learning algorithm for C 
with respect to T> if the following holds. For any distribution V on X x {—1, 1} whose marginal over 
X is T>, if A is given access to a set of labeled examples {x, y) drawn from T>' , then with probability 
at least 1 — 5 algorithm A outputs a hypothesis h : X — > {— 1, 1} such that 

Pr [h(x) / y] < opt + e 

(x,2/)~X ,/ 

where opt is the error made by the best classifier in C, that is, opt = inf gg c ^ T (x,y)^v[9{ x ) 7^ v\- 

Kalai, Klivans, Mansour and Servedio [KKMS08] showed that the existence of low-degree real 
valued polynomial /2-apP r oximators to a class of functions, implies agnostic learning algorithms for 
the class. In an earlier result, Klivans, O'Donnell and Servedio [KOS04] gave a precise relation- 
ship between polynomial approximation and noise sensitivity, essentially showing that small noise 
sensitivity bounds imply good low-degree polynomial Z2-apP rox i m ators. 

Combining these two results, it follows that bounding the noise sensitivity (either Boolean 
or Gaussian) of a concept class C yields an agnostic learning algorithm for C (with respect to 
the appropriate distribution). Thus, using our bounds on noise sensitivity of PTFs, we obtain 
corresponding learning algorithms for PTFs with respect to the uniform distribution over the 
hypercube. 
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Theorem 2.2. The concept class of degree d PTFs is agnostically learnable to within e with respect 
to the uniform distribution on {— 1, 1}" in time n 1 / £ ° < ' d . 

These are the first polynomial-time algorithms for agnostically learning constant degree PTFs 
with respect to the uniform distribution on the hypercube (to within any constant error parameter) . 
Previously, Klivans et al. [KOS08] had shown that quadratic (degree 2) PTFs corresponding to 
spheres are agnostically learnable with respect to spherical Gaussians on M. n . Our bounds on the 
Gaussian noise sensitivity of ellipsoids imply that this result can be extended to all ellipsoids with 
respect to (not necessarily spherical) Gaussian distributions thus resolving an open problem of 
Klivans et al. [KOS08]. 

It is implicit from a recent paper of Blais, O'Donnell and Wimmer [BOW08] that bounding the 
Boolean noise sensitivity for a concept class C yields non-trivial learning algorithms for a very broad 
class of discrete and continuous product distributions. We believe this is additional motivation for 
obtaining bounds on a function's Boolean noise sensitivity. 

3 Organization 

The rest of the paper is organized as follows. We introduce the necessary notation and preliminaries 
in Section 4. We then present the structural results on random restrictions of PTFs (Lemmas 5.1 
and 5.2) in Section 5. In Section 6 we present our analysis of Gaussian noise sensitivity followed by 
the analysis of Boolean noise sensitivity in Section 7. We remark that the analysis of the Gaussian 
noise sensitivity is simpler than the Boolean noise sensitivity analysis, since the Boolean case, in 
some sense, reduces to the "regular" or Gaussian case. We then present our bounds on average 
sensitivity of PTFs in Section 8. 

4 Notation and Preliminaries 

We will consider functions/polynomials over n variables Xi, . . . ,X n . Corresponding to any set 
I Q [n] (possibly multi-set), there is a monomial X 1 defined as X 1 = W^iXi. The degree of the 
monomial X 1 is the size of the set /, denoted by \I\. Note that if / is a "regular" set (opposed to 
a multi-set), then the monomial X 1 is linear in each of the participating variables Xi,i E I. 

A polynomial of degree d is a linear combination of monomials of degree at most d, that is, 
P(X\, . . . ,X n ) = Ylicin] \i\<d a iX I ■ The o/'s are called the coefficients of the polynomial P. By 
convention, we set a/ = for all other /. If the above summation is only over sets / and not multi- 
sets, then the polynomial is said to be multilinear. Observe that while working over the hypercube, 
it suffices to consider only multilinear polynomials. We use the following notations throughout. 

1. Unless otherwise stated, we work with a PTF / of degree d and a degree d polynomial P(X) = 

ajX 1 with zero constant term (i.e., a$ = 0) such that f(X\, . . . , X n ) = sign(P(Xi, . . . , X n ) — 
9). In case of ambiguity, we will refer to the coefficients ai as aj(P). 

2. For a polynomial P as above and an underlying distribution over X = (X\, . . . ,X n ), the 
^2-norm of the polynomial over X is defined by ||-P|| 2 = E [P(X) 2 ] . Note that if P is a 
multilinear polynomial and the distribution is either the multivariate Gaussian J\f n or the 
uniform distribution over the hypercube, then ||-P|| 2 = ^/ a /- 

3. For i G [n], x l = (x 1 ,...,x i ) G {1,-1}%/^ : {1, -1}""* -»■ {1, -1} is defined by f x i(X i+ i, ... ,X n ) = 
sign(P(xi,.. . ,Xi,X i+ i, . . .,X n ) - 6). 



5 



4. For i 6 [n], P^(Xi, . . . , Xi) = J2ic[i] o-iX 1 is the restriction of P to the variables X±, . . . ,Xi. 

5. For a multi-set S, x £ u S denotes an uniformly chosen element from S. 

6. For clarity, we supress the exact dependence of the constants on the degree d in this extended 
abstract; a more careful examination of our proofs shows that all constants depending on the 
degree d are at worst 2°( d \ 

Definition 4.1. A partial assignment x l = (x\, . . . ,Xi) is e-determining for f, if there exists 
b £ {1,-1} such that Pr ( x i+1 ,...,jf n ) 6lt {i ,-i}n-*[fA x i+i> ■ ■ ■ , x n) ^ b] < e. 

We now define regular polynomials which play an important role in all our results. Intuitively, 
a polynomial is regular if no variable has high influence. For a polynomial Q, the weight of the i th 
coordinate is defined by wf{Q) = X^i a i- For i £ [n], let o~i{Q) 2 = J2j>i w ](Q)- 

Definition 4.2 (regular polynomials). A multilinear polynomial Q is e-regular if "Y2i w t{P) — 
e 2 {T,i w H p )) 2 = e M( P )- A PTF f( x ) = s\gn(Q(x) - 6) is e-regular if Q is e-regular. 

We also assume without loss of generality that the variables are ordered such that w\{P) > 
w 2 {P) > ■■■ > w n (P). 

We repeatedly use three powerful tools: (2, 4)-hypercontractivity, the invariance principle of 
Mossel et al. [MOO05] and the anti-concentration bounds of Carbery and Wright [CW01]. We 
state the relevant results below. 

Lemma 4.3 ((2, 4)-hypercontractivity). IfQ,R are degree d multilinear polynomials, then for 
X e u {l,-l} n , E X [Q 2 -R 2 }} < 9 d -E x [Q 2 } -^x[R 2 }- In particular, E[Q 4 ] <9 d -E[Q 2 } 2 . 

The following anti-concentration bound is a special case of Theorem 8 of [CW01] (in their 
notation, set q = 2d and the log-concave distribution /i to be J\f n ). 

Theorem 4.4 (Carbery- Wright anti-concentration bound). There exists an absolute constant C 
such that for any polynomial Q of degree at most d with \\Q\\ = 1 and any interval I C R of length 
a, Pr x ^ n [Q(X) £ I] < Cda 1 ^. 

The following result due to Mossel et al. [MOO05] generalizes the classical quantitative central 
limit theorem for sums of independent variables, Berry-Esseen Theorem, to low-degree polynomials 
over independent variables. 

Theorem 4.5 (Mossel et al.). There exists a universal constant C such that the following holds. 
For any e-regular multilinear polynomial P of degree at most d with \\P\\ = 1 and t € R, 



Pr \P(X) < t] - Pr \P(Y) < t] 



< C d £ 2/(4d+l) _ 



The result stated in [MOO05] uses maxj wf(P) as the notion of regularity instead of J2i w t( p ) 
as we do. However, their proof extends straightforwardly to the above. 

5 Random Restrictions of PTFs 

We now establish our structural results on random restrictions of low-degree PTFs. The use 
of critical indices (K(P,e)) in our analysis is motivated by the results of Servedio [Ser07] and 
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Diakonikolas et al. [DGJ + 09] who obtain similar results for LTFs. At a high level, we show the 
following. 

Given any e > 0, define the e-critical index of a multilinear polynomial P, K = K(P,e), to be 
the least index i such that w 2 (P) < e 2 a 2 +1 (P) for all j > i. We consider two cases depending on 
how large K{P,e) is and roughly, show the following (here c, a > are some universal constants). 

1. K < l/e cd . In this case we show that for x K = {x\, . . . ,xk) S u {1, —1} K , the PTF f x K is 
e-regular with probability at least a. 

2. K > l/e cd . In this case we show that with probability at least a, the value of the threshold 
function is determined by the top L = l/e cd variables. 

More concretely, we show the following. 

Lemma 5.1. For every integer d, there exist constants £ R, jd > such that for any multilinear 
polynomial P of degree at most d and K = K(P, e) as defined above, the following holds. The 

polynomial P x k (Yfe+i, • • • , Y n ) = P{x\, . . . , xk, Yk+i, • • • , Y n ) in variables Yk+i, ■■■ ,Y n obtained 
by randomly choosing x K = (x±, . . . ,xk) S u {1, —1} K is a^e-regular with probability at least 7^. 

Lemma 5.2. For every d, there exist constants bd,Cd £ M, d~d > 0, such that for any multilinear 
polynomial P of degree at most d the following holds. If K(P,e) > c^log(l/e)/e 2 = L, then a 
random partial assignment (x±, . . . ,xl) £« {1, — 1}^ is bdE- determining for P with probability at 
least 5d- 

To prove the above structural properties we need the following simple lemmas. 

Lemma 5.3 ([AGK04, Lemma 3.2]). Let A be a real valued random variable satisfying M[A] = 0, 
EL4 2 ] = a 2 and EL4 4 ] < ba A . Then, Pr[A > a/iVb] > l/4 4 / 3 6. 

Lemma 5.4. For d > there exist constants ad, (3d > such that for any degree at most d 
polynomial Q, and X £ M {1,— l} n , Pr[Q(X) > E[Q] + ctdO~(Q)] > (3d, where o~ 2 {Q) is the variance 
ofQ(X) = \\Q\\ 2 - (E X [Q}) 2 . In particular, Pr[Q(X) > E[Q] } > d . 

Proof. Let random variable A = Q(X) - E X [Q(X)]. Then, K[A] = 0, ~E[A 2 ] = a 2 (Q) and by (2,4)- 
hypercontractivity, E[A 4 } < 9 d E[A 2 } = 9 d a 4 (Q). The claim now follows from Lemma 5.3. □ 

5.1 Proof of Lemma 5.1 

Proof. Let X = (X\, . . . ,Xk). We prove the lemma as follows: (1) Bound the expectation of 
Y2j>K w j(Px) using hypercontractivity and use Markov's inequality to show that with high proba- 
bility Ylj>K w j (Px) 1S sman - (2) Use the fact that cr 2 <+1 (Px) = J2j>K w ](Px) is a degree at most 
2c? polynomial in X and Lemma 5.4 to lower bound the probability that o~\ +l {Px) is large. Let 

P x (Y K+1 ,...,Y n )=P(Xi,...,X K ,Y K+h ...,Y n ) = 

R(X 1 ,...,X K )+ Qj(X 1 ,...,X K )l[Y j . 

JC[K+l,n],0<\J\<d j£J 

We now bound E[J2 j>K wj(P x )]- Fix a j > K and observe that w 2 {P x ) = J2j 3j Qj( x )- Th us, 

e>|(p x )] =J2^[Qj(x)] =J2\\QA 2 = ™](P)- (5-1) 

J3j Jlj 
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Further, by (2, 4)-hypercontractivity, Lemma 4.3, 



E[wj(P x )] = E 



x 



E Q\iX)Q\{X) 



< E [<&(*)] -|[Q 2 J2 W] = E 9 d ll^ll 2 \\Qj 2 f = Q d wj(P). 



J 1, J 2^3 



Hence, WLj>K w )i p x)\ < $ T,j>K w j ( P )- Now ' from the definition of K(P,e), w](P) < 
e 2 a 2 K+l {P) for all j > K. Thus, 

E rf(p) < e 2 a 2 K+l {p) E w K p ) = ^K + i(n 

j>K j>K 

Combining the above inequalities and applying Markov's inequality we get 



Pr [ E rfiPx) > J9W K+1 (P) ] < l/ 7 . 



(5.2) 



j>K 
,2 



Observe that Q(X) = Ylj>K w j( p x) is a degree at most 2d polynomial in X\, . . . ,Xk and by 
(5.1), E [Q] = Ej >K w](P) = ct 2 k+1 (P). Thus, by applying Lemma 5.4 to Q, Pr [ Y. j>K w j( P x) > 
Pk+i(P) ] > @2d- Setting 7 = 2/{3 2 d i n (5-2) and using the above equation, we get 



Pr 

x 



2-1 



E ^(p x ) < a y e rffr) 

j>K \j>K 



where a\ = 2 • 9 d /P2d- Thus, the polynomial Px(Yk+i, ■ ■ ■ , Y n ) is (arfe)-regular with probability at 
least 7 rf = (3 2 d/2. □ 

5.2 Proof of Lemma 5.2 

We use the follwing simple lemma. 

Lemma 5.5. For 1 < i < j < K(P,e), a 2 (P) < (1 - £ 2 ) J '~V? (P) . 
Proof. For 1 < i < K(P,e), we have 

a 2 (P) = w 2 (P) + a? +1 (P) > e 2 a 2 (P) + a 2 +1 (P). 
Thus, a 2 +1 (P) < (1 - e 2 )of(P). The lemma follows. 



□ 



Proof of Lemma 5.2. Suppose that K(P,e) > L = clog(l/e)/e 2 for a constant c to be chosen later 
and let Q(X\, . . . , X n ) = P{X\, . . . , X n ) — P\ L (X\, . . . , Xl). The proof proceeds as follows. We first 
show that IIQH is significantly smaller than ||Pi£,||. We then use Lemma 5.4 applied to P\l — 6 and 
Markov's inequality applied to |Q(X)| to show that |P|£,(Xi, . . . , Xl) — 6\ is larger than |QpT)|, 
so that Q(X) cannot flip the sign of P\l{X\, . . . , Xl) — 0, with at least a constant probability. We 
first bound ||Q||. 

Claim 5.6. For a suitably large enough constant 04, \\Q\\ < \feoid ||P|xJ|. 
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Proof. Let ad, (3d be the constants from Lemma 5.4. By definition ||Q|| 2 = Yli-ig[L] a ? — (J i(-^')- 
Now, 

l:ln[L]^<D 

<d y a2 i+ d E «/+ (T i( p ) 

<d £ af + C Z^^(P)+a|(P) 

/:0^/C[L] j>L 

<d Y, aj + (d+l)a 2 L (P). 

I:lfcIC[L] 

Further, by Lemma 5.5, cr£(P) < (1 — (P). Combining the above inequalities we get, 

ol(P)<O d ((l-s*) L -i) Y al = O d ((l-e*) L -i)o\P). (5.3) 

I:<fcIC[L] 

Choosing L = c d log(l/e)/e 2 for large enough c d , we get the claim. □ 
By Claim 5.6 and Markov's inequality, 

Pr [\Q( Xl ,...,x n )\ >a d ||P| L ||] < Pr [ \Q(x u . . . , x n )\ > \\Q\\/yfi] < e. (5.4) 

xG u {l,-l} n a;Gu{l,-l} n 

Let S C {1, be the set of all bad x L £ {1, -1} L such that, 

Pr r i [|Q(xi,...,x L ,X L+ i,...,X n )| > a d \\P\ L \\] > 2e/f3 d . 
(x L+1 ,...,x n )e u {i -i}« 

Then, from (5.4) and the above equation, Pr^g^ [x L G S] < (3 d /2. Now, let T C {1, -1} L 
be such that for x L E T, \ P\l(xi, . . . ,xl) — 9 \ > ad \\Pl\\ and x L £ S. Observe that all x L G T are 
(2e//3rf)-determining and by Lemma 5.4 and the above equations, 

Pr [x L eT]> Pr [\P\L(xi,...,x L )-9\>a d \\P L \\]- Pr [x L £ S] > (3 d /2. 
The lemma now follows. □ 



6 Gaussian Noise Sensitivity of PTFs 

In this section, we bound the Gaussian noise sensitivity of PTFs and thus prove Theorem 1.4. 
The proof is simpler than the Boolean case and only makes use of an anti-concentration bound for 
polynomials in Gaussian space. 

Although Theorem 1.4 was stated only for multilinear polynomials and ellipsoids, we give a 
proof below that works for all degree d polynomials using ideas from Diakonikolas et al. [DRST09], 
who were the first to prove a bound on the Gaussian noise sensitivity of general degree d polynomials 
(see remarks after the statement of Claim 6.1). 
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Proof of Theorem 1.4- Let / be the degree d PTF and P the corresponding degree d polynomial 
such that f(x) = s\gn(P(x)). We may assume without loss of generality, that P is normalized, i.e., 
\\P\\ 2 =E[P 2 (X)] = 1. 

The proof is based on the Carbery- Wright anti-concentration bound (Theorem 4.4) for degree 

d PTFs. Let X, Y ~ M n and Z = (1 - 6) X + ^/l - (1 - <5) 2 y = (1 - 5) X + V25 - 5 2 Y. Let p = 
V25 - 5 2 . Define the perturbation polynomial Q(X,Y) = P{Z)-P(X) = P((l-6)X+pY)-P(X). 
Now, for 7 > to be chosen later, 

Pr[sgn(P(X)) + sgn(P(Z))] = Pr[sgn(P(X)) + sgn(P(X) + Q(X,Y))} 

= Pr[\P(X)\<\Q(X,Y)\] 
<Pr[|P(X)|< 7 ] + Pr[|Q(X,Y)> 7 ] 

<a* 7 1/d + Pr[|Q(X,Y)|> 7 ] ; 

where the last inequality follows from the anti-concentration bound (Theorem 4.4). In Claim 6.1, 
we show that the norm ||Q|| of the pertubation polynomial is at most c^y/S for some constant a 
(dependant on d). We can now apply Markov's inequality to bound the second quantity as follows. 

Pr[|Q(X,Y)|> 7 ]<||Q|| 2 /7 2 <c d V7 2 . 

Thus, 

GNS 5 (/)<C d7 1/d + ^. 

T 

The theorem follows by setting 7 = in which case we get GNS 5 (/) = O d {5 l ^ 2d+1 ^). □ 

We note that we can get a slightly stronger bound of Od \ 8 1 l 2d y / log(l/5) j if we used a stronger 
tail bound instead of Markov's in the above argument. 

Claim 6.1. There exists a constant C4 such that ||Q|| < c^^/S- 

An earlier version of this paper had an error in the proof of this claim. As pointed out to us 
by the authors of [DRST09], that proof worked only for multilinear polynomials and ellipsoids. 
Diakonikolas et al. [DRST09] proved the claim for general degree d polynomials. For the sake of 
completeness, we give a simplified presentation of their proof (that works for all degree d polyno- 
mials) in Section A. 

7 Noise sensitivity of PTFs 

We now bound the noise sensitivity of PTFs and prove Theorem 1.3. We do so by first bounding 
the noise sensitivity of regular PTFs and then use the results of the previous section to reduce the 
general case to the regular case. 

7.1 Noise sensitivity of Regular PTFs 

At a high level, we bound the noise sensitivity of regular PTFs as follows: (1) Reduce the problem to 
that of proving certain anti-concentration bounds for regular PTFs over the hypercube. (2) Use the 
invariance principle of Mossel et al. [MOO05] to reduce proving anti-concentration bounds over the 
hypercube to that of proving anti-concentration bounds over Gaussian distributions. (3) Use the 
Carbery- Wright anti-concentration bounds [CW01] for polynomials over log-concave distributions. 
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For the rest of this section, we fix degree d multilinear polynomial P and a corresponding degree 
d PTF /. Recall that it suffices to consider multilinear polynomials as we are working over the 
hypercube. We first reduce bounding noise sensitivity to proving anti-concentration bounds. 

Lemma 7.1. For < p < 1, S > 0, 

NS p Cf) < (d + l)S+ Pr [\P(x)-e\<2y/p/S]. 

xg{i,-i}' 1 

Proof. Let S be a random subset S C [n] where each i £ [n] is in S independently with probability 
p. From the definition of noise sensitivity it easily follows that 

NS p (/)= Pr [sign(P(X)-0)^sign(P(X)-2 £ a / X / - 9 ) ] 

XG„{l,-ir,6 7:|/n5|isodd 

Pr [\P(x)-e\ <2| V a/X 7 |] 

11 ' 7:|/nS| is odd 

< Pr [| Y a I X I \>^/6}+ Pr [ | P(X) - 9 \ < 2^p/8] (7.1) 

Define a non-negative random variable Ps as follows: Pj = ^/ |/ns| is odd a /- We can t nen bound 
the first quantity in the above expression using P$ as follows: 



Pr [| y a/X 7 | > y/p/5] < Pr [| V a x X 7 |>Ps/V5] 

1 1 7:|/n5|isodd 1 ' J ' /:|/nS|isodd 

+ Pj[Ps>Vp/V5] (7.2) 

Since Ex( X}/|/ns| is odd a i^- 1 ) 2 = Pjs> by Markov's inequality, we have 

Pr [| y aiX T \ >P S /VS] <S. (7.3) 
xeu{1 ^ 1} " i-.\ms\ is odd 

Now, note that Pj < EieS w i( P )- Thus, E S [P|] < %[£ ie5 w 2 (P)] = p£^ 2 (P) <dp- Hence, 
by Markov's inequality, PrsfPs > ^/p/VS] < d5. The lemma now follows by combining Equations 
(7.1), (7.2), (7.3) and the above equation. □ 

We now prove an anti-concentration bound for regular PTFs. 
Lemma 7.2. // P is e-regular, then for any interval I CR of length at most a, 

Pr \P{X) e I] = O d (a 1 ' d + e 2/{ - u+l) ). 
xe u {i-i} n 

Proof Let Z x = P(X),Z 2 = P(Y) for X e u {l,-l} n ,Y <- M n . Then, since P is e-regular, 
by Theorem 4.5, for all t e K, | Pr[Z x > t] - Pr[Z 2 > t] \ = O d (e 2 ^ 4d+l ">). Now, by the above 
equation and Theorem 4.4 applied to the random variable Y for interval /, Pr[Zi G i] = Pr[Z 2 S 
I] + O d ( e 2 /^ 1 ) ) = O d ( a l l d + e 2 /(4rf+i) ). □ 

We can now obtain a bound on noise sensitivity of regular PTFs. 

Theorem 7.3. If f is an e-regular PTF of degree d, then NS e (/) < O d ( e V(2d+2)) _ 

Proof. Let 5 > to be chosen later. Then, by Lemma 7.1 and Lemma 7.2 above, NS e (/) = 
O d (5 + e 2 /^ 1 ) + eVad/ji/d ). Choosing 5 = eVP^+a) we get N§e(/) = ^ £ i/(2d+2) ) n 
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7.2 Noise Sensitivity of arbitrary PTFs 

We prove Theorem 1.3 by recursively applying the following lemma. 

Lemma 7.4. For every d there exist universal constants c d , A d G N, a d G (0, 1) such that for 
M = mm(K(P, e),c d log(l/e)/e 2 ) and X M = (X 1 ,..., X M ) G u {1, -1} M , 



Pr 

X M 



S e (f X M) < A d e 1 /(2d+2) >^ (7-4) 



Proof. Let adib di c d ,^ d ,8 d be the constants from Lemmas 5.1, 5.2. Let = min^, <$d). We 
consider two cases. 

Case (i): M = K(P,e). Then, by Lemma 5.1 and Theorem 7.3, for X K G« {1,-1}^, with 
probability at least a d , NS £ (/ x k) < /\ d e l K 2d + 2 ) for some constant A d . 

Case (ii): M = Crflog(l/e)/e 2 . Then, by Lemma 5.2, X M G u {1,— 1} M is frwe-determining with 
probability at least a d . Further, if X M is fr^e-determining, with f X M biased towards b G {1, —1}, 
then 

m £ (f X M)= Pr [f xM (Z 1 )^f X M(Z 2 )}<2 Pr [ f x u (Z) / b } < 2b d e, 

Zi£ u {i -i} n - M ,z 2 e e Zi ze„{i,-i}«- M 

where Z 2 G e Z\ is an e-perturbation of Z\. The lemma now follows. □ 

Proof of Theorem 1.3. Let c d ,A d ,a d be as in the above lemma and let L = Crflog(l/e)/e 2 , t = 
lo gl _ ad (l/e). We will show that for 5 = e l ^ 2d+ ^/{Lt) = o d ( e ( 4d + 5 )/( M + 2 )/ log 2 (l/e)), 

ms(f) = O d (e 1 ^ 2d+ V). 

For S C [n] and x G {1, — l} n let -P Xi s : {1,— 1} S — > R be the degree at most d polynomial 
defined by P X) s(X s ) = P(x\g, X s ).. Fix a x = (xi,... ,x n ) G {1,— l} n and define S x ,i C [n] for 
2 > 1, recursively as follows. S^i is the set of M\ < L largest weight coordinates in P given by 
applying Lemma 7.4 to P. For i > 1, let S x ^ = S x ^ U S X)2 U . . . U S 1 ^. 

For i > 1, let S^i+i be the set of Mj+i < L largest weight coordinates in P x Sx ,i given by 
applying Lemma 7.4 to the polynomial P x S x,i. Define f Xj i by f x ,i(-) = sgn(P a . — 0). Note that 
the definition of f X) i only depends on xj for j G S^' 4 and that \S x ' l \ < L ■ i. 

Call x G {1,-1}" (e,/)-good if there exists an i, 1 < i < t such that NS e (f Xji ) < A d eV(2d+2) 
and let t x be such an z for a (e, /)-good x. Then, from the definition of f x ^ and Lemma 7.4, 

Pr [x is (e,/)-goodl > 1 - e. (7.5) 

a;e„{l,-l} n 

Let y G<5 x be a 5-perturbation of x G u {1, — l} n . Then, since IS 21 '* 1 ! < Lt, 

Pr[x\ s ,,t x + y\ s *.t x }<Lt5 = e 1 ^ 2d+2 \ (7.6) 

x,y 

Also note that for any i > 1, conditioned on an assignment for the values in X\gx,% and x\gx,i = 
y\S*s, Pr Xiy [f(x) / f(y)] = N§s(fx,i) < NS s (f Xti ). Thus, conditioned on x being (e,/)-good and 

X\gx,t x = V\S X ^ > 

Pr [/(*) + < NS £ (/ X)tx ) < A d e 1 /(2rf+2)_ (77) 

Combining (7.5), (7.6), (7.7), we get 

NS*(/) <e + LtS + A d£ 1 /(2d+2) = 0d ^i/( 2ci+ 2) ^ 
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( 4d+5 „ \ 

£ 2d+2 1 l g (1/e) j and the above is applicable for all e > 0, we get that for all p > 0, 

NS,(/) = O d ( ]og(l/p)fV«**) )=O d ( 



□ 

8 Average sensitivity of PTFs 

In this section we bound the average sensitivity of PTFs on the Boolean hypercube, proving Theo- 
rem 1.6. We first prove a lemma bounding the average sensitivity of a Boolean function in terms of 
its noise sensitivity. Theorem 1.6 follows immediately from Theorem 1.3 and the following lemma: 

Lemma 8.1 (noise sensitivity to average sensitivity). For any Boolean function f : {1,— l} n — > 
{1,-1}, AS(/) < 2neNS (1/n) (/). 

Proof. Let 6 = 1/n. Let X G u {1,— l} n and let S C [n] be a random set with each element 
i £ [n] present in S independently with probability 5. Let X(S) be the vector obtained by flipping 
the coordinates of X in S. Then, NS(/) = Pv x ,s[f(X) ^ f (X(S))}. Observe that for i G [n], 
■p T [S = i] = 5(1 - <5) n_1 = (1/n) (1 - l/n)™" 1 > l/2ne. Therefore, 

NSi(/) = Pr [/(X) ^ /(X(5))] 

A ,o 

= ^ Pr[ S = {i} ] • Pr [ f(X) + f(X(S)) | S = i ] + Pr [ \S\ + 1 ] • Pi • [ f(X) + f(X(S)) \ \S\ ± 1 ; 

o A o A,o 

i 

□ 

We now give a bound of 0(n l ~ 2 d ) on the average sensitivity using a different (not using the 
noise sensitivity bounds), combinatorial, argument. 

Theorem 8.2. For any degree d PTF f : {1, -l} n -> {1, -1}, AS(/) < 3n 1 ~ 2 " d . 

We first show the theorem using Lemma 1.7. 

Proof. Let P(x) = XjPj(a;_j) + where Pj( ), Qi( ) are degree d—1 and degree ci polynomials 

respectively that do not depend on Xj. Define fi(x-i) = sgn(Pj(x_j)) and gi(x) = f(x)fi(x-i). 
Then, 

Uf) = „ Pr lf(X) + /(*«)] = Pr [/(Jf)/,(X_0 / /(iW)/,^)] 

xe„{i,-i} n 

[/(X)/,(X_,) / /(^O)/.^).,)] = Pr / 

A {1, — 1} 

Observe that gi is monotone increasing in X{ for i G [n] and hence li(gi) = E^pQ^PO]. Thus, 



Pr 




xe»{i,- 


1} 


Pr 




xe„{i r 


1} 







as(/) = j>(/) =E%) = Efi^po] = Egww*(*-*)] = 1 



/(x)E^/i(^- 
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Since < 1 for all x, we have 



AS(/) < E 



We now use induction and Lemma 1.7. For an LTF /, /j as defined above are constants. 
Therefore, by Equation (8.1), 



AS(/) < E 



E 
x 



o(v^). 



Suppose the theorem is true for degree d PTFs and let / be a degree d + 1 PTF and let /, be as 
defined before. Then, by Equation (8.1) and Lemma 1.7 



AS(/) 2 < 2^AS(/i) +n < J2 6nl ~ 2 d +n<7 



n 



2-2" 



Therefore, AS(/) < 3n 



l_ 2 -( d + 1 ) 



The theorem follows by induction. 



□ 



Proof of Lemma 1.7. For brevity, let fi{x) = fi(x-i). By Cauchy-Schwarz, for any random variable 
Z we have E[|Z|] 2 < E[Z 2 ]. Thus, 



E 
x 



< E 
x 



E[J2 x i x jfi( x )fj( x ) 



n 



+ ^E[X i X j f i (X)f j (X)[ 



For % / j G [n], let X-ij = (x k : k G [n], A; / i, j) and let 5/ = {x G {1, -1}" : /j(a;) / /j(x e^)}. 
Note that !,•(/;) = Pr x [X G 5/]. Now, 

E[X i X j f i (X)f j (X)} = nWxiXjfiWfjix) + ]T nWxixrfiWfjix), (8.3) 



A' 



where /x(x) = l/2 n is the probability of choosing x under the uniform distribution. We bound the 
first term in the above expression by the average sensitivity of the /j's and show that the second 
term vanishes. Observe that, 

Y vWxiXjfiWfiix) < n{Sl U S]) < »(Sj) + = Ijifi) + Uf 3 ). (8.4) 

Note that for x ^ Sj U Sj, fi(x), fj(x) are both independent of the values of Xi,xj. For such x 
(abusing notation) let /j(x_^) = fi(x), fj(x-ij) = fj(x) and let = {(x k : k / : x £ Sf U5j}. 
Then, since for x ^ Sf U S**, fi(x), fj(x) depend only on x_ij, we get that x ^ Sf U S** if and only 
if ^ Tij. Therefore, 



Y fl(x)xiXjfi(x)fj(x) = Y K x ~ij) V( x i) K x j) fi( x -ij) fj( x -ij) 
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= ^ K x -ij)fi( x -ij) fj( x -ij) E[xiXj]=0. 

From Equations (8.2), (8.3), (8.4), (8.5) we have, 



(8.5) 



E 
x 



<n + + Ufj)) =n + 2j2J2 I ^ = n + 2 J2 AS ^ 



□ 



Remark 8.3. The bound of Lemma 1.7 is tight up to a constant factor if we only have bounds on 
the average sensitivity of the /,'s to go with. For example, consider fi defined as follows. Divide [n] 
into m = yfn blocks -Bi, . . . , B m of size m each and for 1 < j < m, i € Bj, let fi = W^Byk^i x k- 
Then, the left hand side of the lemma is 0(n 3 / 2 ) and AS(/j) = m — 1 = Q(y/n) for all i. 
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Appendix 

A Bounding the perturbation polynomial in the Gaussian setting 
A.l Background on Hermite polynomials 



The univariate Hermite polynomials are denned as follows 



H k {x) 



(zl2V/2 d k 

\[~k\ dx k 



-x 2 /2 



The univariate Hermite polynomials satisfy H' k {x) = y/kHj t _i(x). 

The multivariate Hermite polynomials over n variables (x±, . . . defined as follows. Let 

S C [d] be a multiset. It will be convenient to denote a multiset S by a sequence of n indices as 
S = (si, . . . , s n ) where each Sj denotes the cardinality of element i S [n] in the set S. Note, by this 
notation, |5| = s i- 

n 

H s (xi,...,x n ) = Y\_H Si (xi). 

i=l 

The partial derivatives of the multivariate Hermite polynomials can now be calculated as follows 
(dH s )i(xi,... ,x n ) = sfsiH Si -i(xi) Y\_H Sj {xj) = ^/FiH s \ {i} (xi,...,x n ). 

Furthermore, the iterative partial derivatives can be calculated as follows. Let R = (r\, . . . , r n ) C 
S be any multiset. 



(dH s ) 



[ • H S \ R . 



This in particular gives the follow Taylor series expansion for H${z) = H${z\, . . . ,z n ) about the 
point x = (xi, . . . ,x n ) for multisets 5. Let |»S| = d. Since H$ depends on at most d variables, we 
can assume without loss of generality that H$ is defined on the first d variables, i.e., S C [d] and 
H s (x) = H s (xi, . . -,x d ). 



H s {z) = H s {x)+Y, £ ^J— -(dHsUx) ■ ( II( 

k=l R:\R\=k 1 1«=1 ri ' \i=l 



%i X i) 



k=l R:\R\=k 



nti^A|ti^- r ^ ! \ii J 



The multivariate Hermite polynomials up to degree d form a basis for the set of all multivariate 
polynomials of degree d. In particular, given any degree d polynomial P(x±, . . . ,x n ), we can write 
it as a linear combination of Hermite polynomials as follows 



P(x\, • • • , Xn) 

The values Ps are called the Hermite coefficients of P. 



£ PsH s (x u ... ; %n ) 

Sc[n]:\S\<d 
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The Hermite polynomials are especially useful while working over the (multivariate) normal 
distribution due to the following orthonormality conditions. 



x ^ \ H s{X)H T {X)] 



This implies that ||P|| 2 = E x ^u[P 2 (X)} = J2 P s- 



1 if S = T 
otherwise. 



A. 2 Proof of Claim 6.1 

Recall that we must prove there exists a constant ca such that ||Q|| < c^VS. 



Proof. Given any degree d multivariate polynomial P, we can write it in the Hermite basis as 
■ \s\<d PsHs(x) and use this expansion to bound ||Q|| — \\P(Z) — P(X)\\ as follows. 

\\Q\\ 2 = E[(P(Z)-P(X)) 2 ] = E ( Yl Ps(Hs(Z)-H s (X))\ 

\S:\S\<d ) 

= J2 p s p T^[(H s (Z) - H S (X)) ■ (H T (Z) - H T (X))] 

S,T 

= Y p s E [(Hs(Z) - H s (X)f] + J2 P s p T E [(H S (Z) - H S (X)) ■ (H T (Z) - H T {X))\ 

S S^T 

= Y p s E [(H S (Z) - H S (X)) 2 ] - Y p s p T (E [H S {Z)H T {X)\ + E [H S {X)H T {Z)\) (A.2) 

S S^T 

where the last step follows from the orthonormality of the Hermite polynomials. 

We will now show that E[Hs(X)Ht(Z)] = for S ^ T. Since S ^ T, it suffices to show the 
following univariate case: E[H s (X)Ht(Z)] = for s ^ t. We now observe that the joint distribution 
(X,Z) is identical to the distribution (Z,X). Hence, to calculate E[H s {X)Ht{Z)] for s/twe can 
assume without loss of generality that s > t. Now, H t (Z) = H t ((l — 5)X + pY) is a bivariate degree 
t polynomial and can be expanded in the Hermite basis as Ej j=o a %jHi{X)Hj{Y). We thus have 
Ex,y[H s (X)H t (Z)] = £* i=0 E X [H S (X)Hi(X)] • E Y [H 3 (Y)} = since s > t > i. 

Plugging this into the expression for ||Q|| in (A.2), we have ||Q|| 2 = Es-PjE [{H S {Z) - H s {X)f] 

Since ||-P|| 2 = Es-^s = ^' ^° P rove t ne claim it suffices if we show that there exists a constant Cd 
such that for any multiset S, \\H S (Z) - H S (X)\\ 2 < c\b. We bound the norm \\H S (Z) - H S {X)\\ 
using the Taylor series expansion of Hs{Z) as stated in equation (A.l). Let \S\ = d; then we have 



k=lR:\R\=k Hi=l ' % - \ t=l V ; 



n 



•E 



< 



E E 



1 



-d k l 2 E 



\i=l 



k=lR:\R\=k 111=1 r i 

[ since each Sj < d and Y^ = \R\ = k ] 

d 



1 



k=l R:\R\=k lli=l '*■ \ 



E 



•E 



,i=i 
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By Cauchy-Schwarz inequality 
1 



E^ E yrd ALL 

R:\R\=k lli=l \ i=l 



k=l 



[ By orthonormality of H S \ R and independence of (Zi — X\) over the i's ] 
1 



E dfc/2 E 



fc=i 



fl:|Ji|=fc 



nf=i^ ! \ i= i 



(2r t )! 



TV! 



[ Since Zj — Xj ~ A/"(0, v 25) whose 2r-th moment is 5 



r (2r)! 



E( d5 ) fc/2 E \ II (TO - E(^) fe/2 • d * • 2k/2 = E( d3/2 ^) A 

fc=l i?:|/?|=fc \ i=l ^ ^ ' fc=l fc=l 

[Ifd^ 2 V25< 1/2] 



< 2d 3 / 2 ^25 

Thus, if d 3 / 2 V25 < 1/2, thenE[|iJ s (Z)-iJ s (X)|] < 2d 3 / 2 V25. We can now use (1, 2)-hypercontractivity 
for degree d polynomials under the normal distribution (see [Jan97, Remark 5.13]), and bound 
\\H S {Z) - H S {X)\\ as follows. 

\\H S (Z) - H S (X)\\ 2 < e d E[\H s (Z) - H S (X)\] < 2d 3 / 2 e d V2S. 

Ifd 3 / 2 V25 > 1/2, wehaveE[\Hs{Z)-H s {X)\ 2 } < 2E[H 2 S {Z) + H 2 S {X)] < 4 < M 3 / 2 V25. Thus, 
either way, we have that there exists a constant Cd such that \\Hs(Z) — Hs(X)\\ < CdVS- □ 
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